Skip to content

hashmap: Store hashes as usize internally #36595

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 1 commit into from
Nov 1, 2016

Conversation

bluss
Copy link
Member

@bluss bluss commented Sep 20, 2016

We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.

Fixes #36567

@rust-highfive
Copy link
Contributor

r? @brson

(rust_highfive has picked a reviewer for you, use r? to override)

@bluss
Copy link
Member Author

bluss commented Sep 20, 2016

Needs benchmarks and testing on 32-bit.

@arthurprs
Copy link
Contributor

arthurprs commented Sep 24, 2016

-- updated benchmarks bellow

@alexcrichton
Copy link
Member

@bluss we discussed this briefly at libs triage today and were curious, do you have some representative numbers as well about this change? (or @arthurprs do you have some?) Would be useful to see the comparisons!

I was personally a little worried about losing 32 bits of a 64-bit hash on 32-bit platforms, but so long as we generally recommend a uniform distribution of bits I think it'll work out.

// We need to avoid 0 in order to prevent collisions with
// EMPTY_HASH. We can maintain our precious uniform distribution
// of initial indexes by unconditionally setting the MSB,
// effectively reducing 64-bits hashes to 63 bits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment needs updating because of the 32 -> 31?

@arthurprs
Copy link
Contributor

arthurprs commented Sep 27, 2016

Here are some x86 benchmarks. Keep in mind that most of these are <usize, usize> which is the best case for this change (memory footprint is 25% smaller).

What I don't get is why in the hhkkvv layout the iteration becomes slower, I'll try to dig something from the assembly.

-- benchmarks remove look for updates bellow

@bluss
Copy link
Member Author

bluss commented Sep 27, 2016

I wanted to know how it influences bootstrap time on both 64-bit and 32-bit, it would be a good metric. I don't have the resources to do the comparison, so I left this here to ask if someone wanted to try it.

Ah, the benchmarks are for kkvv layout (current) and kvkv (your pr)

@arthurprs
Copy link
Contributor

I looked at the disassembly but I can't see anything that would make iteration that much slower. From an algorithm POV it makes no sense to me.

@alexcrichton
Copy link
Member

@arthurprs those numbers seem to be mostly related to performance, but isn't the theory behind this change that it mostly saves memory?

@arthurprs
Copy link
Contributor

@alexcrichton sure, I just happened to have these around due to the other PR. And the decrease in cache pressure shows up in these.

As for memory benefits it's easy to calculate as a function of sizeof(K) and sizeof(V) as it doesn't depend on other factors like padding etc.

sz(K)+sz(V) 4 8 16 24 32 64 128 256
(x+4)/(x+8) 0.667 0.75 0.833 0.875 0.9 0.944 0.971 0.985

@bluss
Copy link
Member Author

bluss commented Sep 29, 2016

@alexcrichton Smaller data means better performance; more hashes fit into a cache line during lookup. So performance was my main thought.

@arthurprs
Copy link
Contributor

Dealing with so many benchmarks is tricky. I found a problem so I'm repeating the benchmarks, will post fixed and expanded updates soon.

@arthurprs
Copy link
Contributor

arthurprs commented Sep 29, 2016

Here's the updated results. And the gist with even more comparisons and better visualization https://gist.github.com/arthurprs/9f28847dceee86bd5cfffcd30d9cd6cc

Code: https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize

➜  hashmap2 git:(layout_usize) ✗ cargo benchcmp hhkkvv_u64:: hhkkvv_usz:: x86.txt
 name                            hhkkvv_u64:: ns/iter  hhkkvv_usz:: ns/iter  diff ns/iter   diff % 
 grow_100_000                    1,288,426             1,347,922                   59,496    4.62% 
 grow_10_000                     1,302,671             1,351,398                   48,727    3.74% 
 grow_big_value_100_000          38,371,161            37,769,065                -602,096   -1.57% 
 grow_big_value_10_000           4,054,619             4,437,890                  383,271    9.45% 
 grow_fnv_10_000                 433,147               380,392                    -52,755  -12.18% 
 insert_100                      5,215                 4,970                         -245   -4.70% 
 insert_1000                     48,039                44,946                      -3,093   -6.44% 
 insert_100_000                  6,843,047             7,122,700                  279,653    4.09% 
 insert_10_000                   532,906               530,855                     -2,051   -0.38% 
 insert_1_000_000                124,166,274           124,684,470                518,196    0.42% 
 insert_int_bigvalue_10_000      1,440,912             1,390,695                  -50,217   -3.49% 
 insert_str_10_000               626,603               595,789                    -30,814   -4.92% 
 insert_string_10_000            1,370,791             1,371,131                      340    0.02% 
 iter_keys_100_000               371,087               383,975                     12,888    3.47% 
 iter_keys_1_000_000             8,448,411             8,579,116                  130,705    1.55% 
 iter_keys_big_value_100_000     352,423               393,528                     41,105   11.66% 
 iter_keys_big_value_1_000_000   8,386,089             8,615,999                  229,910    2.74% 
 iter_values_100_000             392,468               545,712                    153,244   39.05% 
 iter_values_1_000_000           8,797,970             11,673,560               2,875,590   32.68% 
 iterate_100_000                 394,909               552,853                    157,944   40.00% 
 iterate_1_000_000               8,778,111             11,821,943               3,043,832   34.68% 
 lookup_100_000                  328,320               313,771                    -14,549   -4.43% 
 lookup_100_000_bigvalue         319,496               316,429                     -3,067   -0.96% 
 lookup_10_000                   265,687               252,238                    -13,449   -5.06% 
 lookup_10_000_bigvalue          281,491               261,869                    -19,622   -6.97% 
 lookup_10_000_exist             257,790               246,435                    -11,355   -4.40% 
 lookup_10_000_noexist           273,700               273,483                       -217   -0.08% 
 lookup_1_000_000                257,266               251,604                     -5,662   -2.20% 
 lookup_1_000_000_bigvalue       264,313               261,944                     -2,369   -0.90% 
 lookup_1_000_000_bigvalue_unif  684,819               740,230                     55,411    8.09% 
 lookup_1_000_000_unif           598,199               619,402                     21,203    3.54% 
 merge_shuffle                   1,604,293             1,636,942                   32,649    2.04% 
 merge_simple                    61,494,395            42,660,176             -18,834,219  -30.63% 
 new                             9                     9                                0    0.00% 
 with_capacity_10e5              2,537                 1,308                       -1,229  -48.44% 
➜  hashmap2 git:(layout_usize) ✗ cargo benchcmp hhkvkv_u64:: hhkvkv_usz:: x86.txt 
 name                            hhkvkv_u64:: ns/iter  hhkvkv_usz:: ns/iter  diff ns/iter   diff % 
 grow_100_000                    1,162,181             1,138,593                  -23,588   -2.03% 
 grow_10_000                     1,158,647             1,084,507                  -74,140   -6.40% 
 grow_big_value_100_000          37,001,012            36,665,075                -335,937   -0.91% 
 grow_big_value_10_000           3,437,647             3,538,101                  100,454    2.92% 
 grow_fnv_10_000                 419,084               349,202                    -69,882  -16.67% 
 insert_100                      4,823                 4,556                         -267   -5.54% 
 insert_1000                     46,000                45,223                        -777   -1.69% 
 insert_100_000                  6,664,378             6,213,870                 -450,508   -6.76% 
 insert_10_000                   534,694               510,209                    -24,485   -4.58% 
 insert_1_000_000                123,740,627           107,358,807            -16,381,820  -13.24% 
 insert_int_bigvalue_10_000      1,651,770             1,539,210                 -112,560   -6.81% 
 insert_str_10_000               599,655               580,417                    -19,238   -3.21% 
 insert_string_10_000            1,382,560             1,363,848                  -18,712   -1.35% 
 iter_keys_100_000               361,889               339,290                    -22,599   -6.24% 
 iter_keys_1_000_000             8,312,419             8,026,248                 -286,171   -3.44% 
 iter_keys_big_value_100_000     526,549               584,094                     57,545   10.93% 
 iter_keys_big_value_1_000_000   9,904,451             9,810,863                  -93,588   -0.94% 
 iter_values_100_000             366,842               338,926                    -27,916   -7.61% 
 iter_values_1_000_000           8,532,860             8,143,969                 -388,891   -4.56% 
 iterate_100_000                 367,828               342,287                    -25,541   -6.94% 
 iterate_1_000_000               8,542,904             8,187,025                 -355,879   -4.17% 
 lookup_100_000                  323,857               307,916                    -15,941   -4.92% 
 lookup_100_000_bigvalue         342,299               323,328                    -18,971   -5.54% 
 lookup_10_000                   262,727               251,569                    -11,158   -4.25% 
 lookup_10_000_bigvalue          282,536               264,107                    -18,429   -6.52% 
 lookup_10_000_exist             252,367               244,794                     -7,573   -3.00% 
 lookup_10_000_noexist           273,349               273,088                       -261   -0.10% 
 lookup_1_000_000                284,377               248,260                    -36,117  -12.70% 
 lookup_1_000_000_bigvalue       268,109               262,584                     -5,525   -2.06% 
 lookup_1_000_000_bigvalue_unif  660,544               687,425                     26,881    4.07% 
 lookup_1_000_000_unif           517,636               535,840                     18,204    3.52% 
 merge_shuffle                   1,488,137             1,400,964                  -87,173   -5.86% 
 merge_simple                    38,786,908            31,456,907              -7,330,001  -18.90% 
 new                             9                     9                                0    0.00% 
 with_capacity_10e5              2,500                 1,350                       -1,150  -46.00%

@alexcrichton
Copy link
Member

@arthurprs hm some of those benchmarks seem worrisome indicating that iteration gets 40% slower? Is that just an outlier though?

@arthurprs
Copy link
Contributor

@alexcrichton yeah, it's super odd (and makes no sense to me) but I can reproduce it every time here. I was hopping somebody else could run the benchmarks.

It's just a matter of checking out these branches (https://github.com/arthurprs/hashmap2/tree/layout and https://github.com/arthurprs/hashmap2/tree/layout_usize) and running something like
cargo bench hhkkvv::iter --target=i686-unknown-linux-gnu

@alexcrichton
Copy link
Member

@rfcbot fcp merge

Another neat improvement to our hash tables, and seems like the numbers back it up?

@bors
Copy link
Collaborator

bors commented Oct 7, 2016

☔ The latest upstream changes (presumably #36753) made this pull request unmergeable. Please resolve the merge conflicts.

@alexbool
Copy link
Contributor

rfcbot stuck?

@bluss
Copy link
Member Author

bluss commented Oct 12, 2016

I'm waiting for @arthurprs's hashmap change to merge first before updating. I think the benchmarks said this was a universal win for that memory layout(?), so it should be a simple decision then.

@alexcrichton
Copy link
Member

Ah yeah we've discussed this as well, so it's ok to r+ when ready to go.

@bluss bluss force-pushed the hashmap-usize-for-hash branch from 4283919 to 6707668 Compare October 17, 2016 13:50
We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.
@bluss bluss force-pushed the hashmap-usize-for-hash branch from 6707668 to 13a1f21 Compare October 17, 2016 13:55
@nnethercote
Copy link
Contributor

Measuring with compare.py from rustc-benchmarks would be useful here.

@alexcrichton
Copy link
Member

@bluss this is ready to go now, right?

@bluss
Copy link
Member Author

bluss commented Oct 31, 2016

It is ready to go. I haven't invested more time in benchmarking this, and that wasn't the plan from the start either; I wanted to simply implement this and give to rustc developers if they were interested. @arthurprs's benchmarks show surprisingly noticeable gains, so I'm happy.

@alexcrichton
Copy link
Member

@bors: r+

Ok, thanks!

@bors
Copy link
Collaborator

bors commented Oct 31, 2016

📌 Commit 13a1f21 has been approved by alexcrichton

@bors
Copy link
Collaborator

bors commented Nov 1, 2016

⌛ Testing commit 13a1f21 with merge 265ab65...

bors added a commit that referenced this pull request Nov 1, 2016
hashmap: Store hashes as usize internally

We can't use more than usize's bits of a hash to select a bucket anyway,
so we only need to store that part in the table. This should be an
improvement for the size of the data structure on 32-bit platforms.
Smaller data means better cache utilization and hopefully better
performance.

Fixes #36567
@bors bors merged commit 13a1f21 into rust-lang:master Nov 1, 2016
@bluss bluss deleted the hashmap-usize-for-hash branch November 1, 2016 08:42
@arthurprs
Copy link
Contributor

arthurprs commented Nov 1, 2016

The gains very likely come from smaller hash array memory footprint and cheaper displacement() calculation. The later is in a very hot code path.

Edit: actually the displacement may not affected but the patch is probably saving 64bit arithmetic and freeing an extra register in other places.

@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 8, 2016
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants